Adding more Threadblock Tiles for Mixed-input TensorOp (BF16 * S8) in cutlass_library #1132

manishucsd · 2023-10-09T20:09:11Z

This PR adds more ThreadblockShapes (CTA) tile shapes in cutlass_library. More ThreadblockShapes are need to autotune and choose the most performant tile shapes for mixed-input GEMMs.

cutlass_library

For the cutlass_library, it adds more ThreadblockShapes into two GenerateSM80* function:

GenerateSM80_TensorOp_16816_mixed_input_upcast_a : For upcast on operand A (e.g. S8* BF16)
GenerateSM80_TensorOp_16816_mixed_input_upcast_b : For upcast on operand B (e.g. BF16 * S8)

Unit tests

More tile shapes needs different warp-configurations and this PR adds more device- and warp-level tests ensuring that those configurations are functionally solid (and remains solid).

.gitignore

python/cutlass_library/generator.py

test/unit/gemm/warp/gemm_mixed_input_sm80.cu

Adding more tiles in the cutlass_library for mixed-input support.

4dc7ee0

manishucsd changed the title ~~Adding more Threadblock Tiles for Mixed-input TensorOp in cutlass_library~~ Adding more Threadblock Tiles for Mixed-input TensorOp (BF16 * S8) in cutlass_library Oct 9, 2023

hwu36 reviewed Oct 10, 2023

View reviewed changes

.gitignore Show resolved Hide resolved

hwu36 reviewed Oct 10, 2023

View reviewed changes

python/cutlass_library/generator.py Show resolved Hide resolved

hwu36 reviewed Oct 10, 2023

View reviewed changes

test/unit/gemm/warp/gemm_mixed_input_sm80.cu Outdated Show resolved Hide resolved

Manish Gupta added 2 commits October 11, 2023 05:46

fix rebase issue

4a0d3e7

more tiles to upcast a

f1aa199

manishucsd mentioned this pull request Oct 13, 2023

[FEA] De-quantization and int4 support for mixed dtypes GEMM #1122

Closed

hwu36 approved these changes Oct 13, 2023

View reviewed changes

hwu36 merged commit 757275f into NVIDIA:main Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding more Threadblock Tiles for Mixed-input TensorOp (BF16 * S8) in cutlass_library #1132

Adding more Threadblock Tiles for Mixed-input TensorOp (BF16 * S8) in cutlass_library #1132

manishucsd commented Oct 9, 2023 •

edited

Loading

Adding more Threadblock Tiles for Mixed-input TensorOp (BF16 * S8) in cutlass_library #1132

Adding more Threadblock Tiles for Mixed-input TensorOp (BF16 * S8) in cutlass_library #1132

Conversation

manishucsd commented Oct 9, 2023 • edited Loading

cutlass_library

Unit tests

manishucsd commented Oct 9, 2023 •

edited

Loading